DOCOHESI RSSOHE 



ED 201 667 



TH 810 261 



AOTHOS 
TITLE 

POB DATE 
NOTE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



i^illson^ Victor L. 

Robinson^s Measure of Agreement as a Parallel FDrais 

Reliability Coefficient, 

[77] 

lip. 

MP01/PC01 Plus Postage, 

^Comparative Analysis ; * Correlation ; *Dif f iculty 
Level; ^Mathematical Formulas: Simulation; Test 
Items; *Test Reliability 

^Parallel Forms Reliability: ^Robinsons Measure of 
Agreement 



ABSTRACT 

A major deficisncy in classical test theory 
reliance on Pearson product- mom^snt (PPM) correlation concep 
definition of reliability. PPa measures are totally insensi 
first moment differences in tests which leads to the dubioa 
at^'Sumption of essential tan-equivalence, Robinson proposed 
of agreement that is sensitive to different test difficulty 
a practical statistic to estimate reliability in the present 
known form variation in difficulty. Robinson's measure of a 
appears to be a useful alternative to the generalizability 
coefficient^ as it provides a more conservative estimate of 
reliability under conditions of parallel form differences i 
This is likely to be especially useful when examining intec 
reliability when internal consistency of the raters is pooc 
Robinson's measure does not seem advantageous for highly re 
parallel tests such as are encountered in standardized test 
programs. A simulation study is presented to illustrate the 
the coefficient's sensitivity to form difficulty variance, 
measure of agreement and the intraclass correlation are con 
each simulation and their values are compared. (author/RL) 
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Roblnso 's ; eaii of Agreement as 
Pa rail el -ovkz reliability Coefficient 



A major deficieru in classical ^est theory is the anc: Pearson 
product-moment (PPM) correlation conr^^pts in the definitic:: of Tability. 
PPM measures are totally Insensvcive to first moment differences tests 
which leads to the dubious assun::5t'ic of essential tan^eq ji val6v:.e. Lord 
and Novick, (1968; p. 194) suggest t ,at when tests are parallel except for 
mean difficulty differences t'ra researcher ''may prefer seme of the 
conventional formula (8.8.2)". Th^. formula they present for r-^or variance 
is 



estimated by 



= SMI • r.,) (2) 



where a| - population error variance, 

0^ = population score variance, 

p = parallel forms reliability, 

= some pooled estimate of S^j and S^p 
y y y 

Yj, Y2 ' random variable score at time 1 or 2 
y^, y2 = realizations of , Y2 at times 1^ 2 
ri2 = PPM between y^ , y^ 
It is clear that (1) and (2) do not account for nonparallel ism in mean 
difficulty since all parameters and statistics employed are first-moment 
insensitive. This insensitivity has in recent years been shown to have 
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T.-riortant consec nces. "^h'r most esrly demonstrated in 

]a"nt trait mo : {zf. ri^mi''r^i-zr arl Cook, ~). Differential parallel 
zes: difficulty 1 affect g^e:i£"j™ 1n critercin referer.ced testing, 
TTias^ery testing . :.~d compstenizy 7E3t''rc. Thus, reliability coefficient 
tha"c is sensit: e to riean G"'f'**^ic di-^ference: is neede: 
Pre cedures 

Robinson 9b7) proposed a -easure of £7re™:ent that is sensitive to 
different test diffic .Ity . He c= ^e'os^d it in t-^:e contest of !( raters but 
its application to K fomis i: der.: : 



H L it - : (Vik)]= 



"a 



(3) 



The sample estimate is 



1 - ? 
= 1 ■ 



1 I ^ - ' " 7 (4) 



\li.y. --.)^ 
where i = ith person 

k = kth form, G"^ K i's-ns:.- 
This measure is quite simil5rr ::a y^-^Hey's (1921) eta-squared statistic 
except the numerator of (4) is a znin- zf squares within person across 
forms pooled across persons, "he roninininator is the total sum of squares. 

Robinson points out th^: th^-.z irssure is formally related to the intra 
class correlation coefficient which .ooth Lord and Novick (1968) and 
Cronback, Gleser, Nanda, and Rajaratnam (1972) propose in generalizing 
across subjects (and possibly forms). The relation is as follows (Robinson, 
1957): 
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+ 1 for two forms, (5) 



Pa '^f'^^MPi i ^ forms. (6) 

Computationally S is preferable to the intraclass correlation on 
a 

a number of grounds: 1) p^ is always positive or zero, never negative 

a 

as p^. may become; 2) it is independent of k, where as p^is a function of 

k; 3) direct tests are available for p^, since it is a linear function of 

a 

p^-» for which Fisher (1938) provided distributional tests. Thus, Robinson's 
measure of agreement complements the general izabi! ity coefficient and gives 
a practical statistic to estimate reliability in the presence of known 
form variation in difficulty. 

Tests of Significance . From Fisher (1934) the significance test for the 
intraclass correlation coefficient is given as 

F = 1 + ( n - 1) p. (7) 



This F-statistic is compared with a tabled value with k-1 and k (n-1) 

degrees of freedom for level alpha. This is termed F critical. Then, 

using (6) and (7), the critical value for ^ for significance from zero is 

0 -critical = k"! /iLcn ticaj^ 1_ + 1 (8) 
^ ^""k ^F critfcal +(n-l)/ k 

Simulation study . A simulation study is presented to acquaint the reader 

with degree of the coefficient's sensitivity to form difficulty variance. 

For sets of 50 scores the difficulty of the forms was varied by adding 
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a constant amount to each score in a given form. Results are presented 
in Tables 1-3 for fo^rm internal consistencies of -90, ,70, and .50. 
That is, for internal consistency .90 all forms shared the same tv/o scores 
which comprised 90% of the within fom variance. Each sccre in the 
second through sixth form was increased in value 1%, 2%, 5%, or 10% of 
the total form population variance to produce unequal form means. Robinson's 
measure of agreement and the intraclass corelation were then computed for 
each simulation. A total of seventy five runs was made (5 levels of forrri 
by 5 levels of mean difference by 3 levels of internal consistency). 
Inspection of Tables 1 to 3 leads one to conclude that differences are 
small for highly internally consistent forms (about a .02 difference 
for coefficient alpha = .90), For forms with moderate 'internal consistency 
(.70) the Robinson measure is typically about .05 lower than the intr- 
class correlation. For low internal consistency (.50) the Robinson 
measure is typically .12 lower than intraclass correlation for 2 or 3 
forms, and it drops to about .07 for 5 or 5 fo;^s. There appears to be 
no greater difference between the coefficients with greater difference 
in form means, although the reliability generally drops with greater 
difference in forms for Robinson's measure. The simulation is merely 
indicative of the analytical results. 

Discussion 

Robinson* s measure of agreement appears to be a useful alternative 
to the general izability coefficient, as it provides a more conservative 
estimate of reliability under conditions of parallel fom differences in 
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mean. " is is likely tc :: especially useful when exi.ninirg r f-ate: 
reliafa- -.rty when interna :r;nsistency of the raters is poor "rjinso'- 

measur- does not ? eem ac ~-~~30US for highly reliable parable" 

such a:; are encou-rered • r-andardized testing programs. 



ERIC 



7 



Rol)~nson's r.<t\::^ 



Table 1: Simulation re^. its for Rot ':;or's Measure of /-greemept and Intracla^s 
Correlation, - afficient A j: a = .90 for each Form. 



Form Differences 



% of 02 


2 




4 


5 


0 


■ 3 

0% 


r\ f f 

= .955 

a 

;. =.983 


z 11 


n on 




Q 1 Q 

. y 1 o 




,951 


.947 


.939 


.932 


n 


.945 
.973 


.33: 

.95- 


.927 
.945 


.930 
.944 


.905 
.921 


2% 


.949 
.975 


.92, 
.9- 


.910 
.932 


.924 
.939 


.925 
.937 


5% 


.950 

,980 


.9r 

. 95'- 


.92 
.94 


.899 
.919 


.912 
.927 


10% 


.971 
.985 


.911 


.908 
.931 


.916 
■ .933 


.837 
.854 



Note 1: Top r iber is Robinson's meas. of agreement, bottom number ^s the 
intrcclass correlation for ee -rn pair. 

Note 2: Each fo™ had 50 observations. 
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Table 2: Simulation resurlt for Robinson's measure of agreement intra- 
class correlation coefficient alpha = .70 for each forni. 



Form difference Number of Forms 

as % of a2 2 3 4 5 6 



0% 




■■ .875 


.855 


.583 


.801 


76 r 






.938 


.904 


.753 


.841 


. 3C8 


n 


.841 
.921 


.790 
.850 


.772 
.829 


.718 
.775 


,755 
.795 


2% 




.859 
.929 


.810 
.873 


.774 
.830 


.813 
.850 


.759 
.799 


5% 




.872 
.935 


.810 
.873 


.717 
.787 


.754 
.81T 


.788 
.324 


10% 




.810 
.905 


.771 
.847 


.748 
.811 


.772 
.818 


.'37 
.755 



Note 1: Top number is Robinson's measure of agreement, bottom number is the 
intraclass correlation. 

Note 2: Each form had 50 observations. 
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Table 3: Simulation resu'ts fcr Robinson's measure of agreement and intraclass 
correlation, coefficient alpha = .50 for each form. 



Form difference Number of Forms 

as % of o2 2 3 4 



0% Pg 



.652 


.712 


.561 


.622 


.622 


.826 


.808 


.671 


.697 


.685 


.662 


.619 


.494 


.551 


.517 


.831 


.746 


.620 


.641 


.597 


.829 


.605 


.630 


.633 


.606 


.915 


.737 


.722 


.706 


.672 


.818 


.591 


.558 


.552 


.586 


.909 


.727 


.668 


.721 


.655 


.751 


.546 


.581 


.555 


.552 


.88'i 


.697 


.686 


.644 


.626 



n 

2% 
5% 



Note 1: Top number is Robinson's measure of agreement, bottom number is the 
intraclass correlation. 



Note 2: Each form had 50 observations. 
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